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Abstract 

This paper addresses the use of experimental data for calibrating a 
computer model and improving its predictions of the underlying physi- 
cal system. A global statistical approach is proposed in which the bias 
between the computer model and the physical system is modeled as a 
realization of a Gaussian process. The application of classical statistical 
inference to this statistical model yields a rigorous method for calibrating 
the computer model and for adding to its predictions a statistical correc- 
tion based on experimental data. This statistical correction can substan- 
tially improve the calibrated computer model for predicting the physical 
system on new experimental conditions. Furthermore, a quantification 
of the uncertainty of this prediction is provided. Physical expertise on 
the calibration parameters can also be taken into account in a Bayesian 
framework. Finally, the method is applied to the thermal-hydraulic code 
FLICA 4, in a single phase friction model framework. It allows to improve 
the predictions of the thermal-hydraulic code FLICA 4 significantly. 
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1 Introduction 



In physics and engineering, computer experiments have been used for a long time 
as surrogates to costly, or unpractical, real experiments. However, the impact of 
the substitution of a computer experiment to an actual experiment is not easy 
to assess. In general, this substitution induces a double error. The first error is 
that a computer experiment is the numerical implementation of a mathematical 
model, so that a bias always exists between this mathematical model and the 
computer model. The quantification and reduction of this first error is the field 
of model verification [1. . The second error is that the mathematical model itself 
may not represent perfectly the underlying physical phenomenon. This second 
error defines the field of model validation. A reference book on model validation 
is e.g. [2]. 

In the present work, and similarly to several references on statistical analysis 
of the validation problem [3], we assume that the computer model has already 
been verified. Hence, we focus on the validation problem which is a very im- 
portant issue in nuclear engineering l-^. Nevertheless, our objective is less to 
study the validity of the computer model than to improve the computer model 
predictions, and quantify the uncertainty obtained, by assimilating experimen- 
tal results. A recent reference on demonstrating, or refuting, the validity of the 
actual computer model would rather be [3]. 

In practice, the study of computer model predictions is complicated by the 
fact that a computer model often comes with fitting parameters. These param- 
eters either allow to model a physical system more accurately or are a conse- 
quence of uncertainties with respect to some physical parameters. Hence, the 
analysis of the predictions of the computer model often needs to be carried out 
simultaneously with a calibration analysis. The goal of the calibration analysis 
is to quantify the uncertainty related to the fitting parameters. An example of 
a methodology carrying out calibration and prediction analysis globally is the 
Best-Estimate methodology [SJ [3]. This methodology proceeds through data 
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assimilation and takes into account the different sources of uncertainties coming 
from the model parameters, the experimental errors, the numerical errors, and 
the limitations of the physical models. 

In this work, a computer model is a function fmod of the form fmod{x,l3) '■ 
jjd ^ jjm _^ |[j xhis computer model is a representation of a physical system 
that is a deterministic function taking the form freai{x) : M"^ M. 

The scalar output of the physical system is the physical quantity of interest. 
It is a deterministic function freai of a vector x of input quantities, that we call 
experimental conditions. These components of the vector x can be divided 
into two categories. The first category contains the control variables. These 
variables define the physical system, independently of the environment in which 
the system is put. In engineering for instance, geometric parameters of the 
system can often be placed in this category, since they remain fixed regardless 
of what happens to the system. The second category contains the environment 
variables. These variables are the input of the physical system which values 
are not planned in the conception of the system. These variables are likely to 
be imposed beforehand by other systems. The distinction of the experimental 
conditions into these two categories is presented for instance in [7 section 2.1. 
To give an illustration, in the system design phase, the environment variables 
are set by the future use of the system, while the control variables are the free 
parameters that may be set through an optimization phase. 

The function freai of the physical system can not be evaluated for all the 
experimental conditions. Hence, this function is approximated by the computer 
model fmod- This function shares the same input vector x as the physical sys- 
tem and provides the same scalar output. Furthermore, the function fmod has a 
second kind of inputs, denoted by the vector /3. The components of this vector 
are the fitting parameters of the computer model fmod- These parameters are 
unnecessary to carry out an experiment of the physical system, but they are 
needed to run the computer model. Hence, these quantities are seen as degrees 
of freedom for the computer model, and allow it to give a good approxima- 
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tion of the physical system. In the sequel these parameters are called model 
parameters. 

The calibration problem is the problem of computing a value (3 for the 
model parameter (3. This is done on the basis of a set of experimental results. 
A set of experimental results is a set {a;^^-*, /o6s(aj'-^-'), a;^"-*, /ofesla^*-"-*)} where 
/ohs(a;^')) is the output of the physical system observed at the experimental con- 
dition a;'^'). Note that the quantity /obs(a;^*'') may be different from ,freai{x^^^), 
for example because of measurement errors. In practice, in nuclear engineering, 
calibration is an important issue because many computer models usually have 
model parameters that are totally or partially unknown [8]. 

Before presenting the Gaussian process modelling of this work, we emphasize 
the potential limitations of uncertainty quantification methods that would only 
address calibration. The most classical example of these methods is the least 
square calibration, which consists in minimizing with respect to /3 a quadratic 
misfit between the experimental results and the values that the computer model 
predicts when parameterized by (3. Hence, the value of (3 obtained with the least 
square method is e argmin^X]"^! {fmod{x'^''\ (3) - fobsix'-''^))'^ ■ A method 
addressing only the calibration problem, such as the least square calibration 
method, is generally based on two hypotheses. The first hypothesis is that the 
computer model is capable to perfectly reproduce the physical system. That is 
to say, there is a model parameter /3q so that \fx, freai{x) — fmod{x., (3^). This 
hypothesis means that the physical knowledge that was put in the computer 
model is sufficient to model the physical system perfectly. The second hypoth- 
esis is that the deviations (/,„od(a^^'\ /3o) ^ /o6s(a;^'^)) come from uncertainties 
related to the experiments. These uncertainties have generally two sources. 
First, the observations are affected by measurement errors. Second, there is 
a replicate uncertainty, meaning that the experimental conditions can not be 
known exactly for a given experiment. The main limitation is the assumption 
that the deviations fmod{x''^\ (3) — fobs{x'^^^) come only from uncertainties re- 
lated to the experiments. Indeed the order of magnitude of these uncertainties 
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is known. Hence, when the errors \fmodix^^\ l3) ~ fobs{x^^^)\ are too large com- 
pared to this order of magnitude, it indicates that there is a problem with this 
assumption (this can be quantified by Monte Carlo methods). In this case, the 
fact that the computer model can not represent the physical system perfectly 
needs to be taken into account. 

The limitation discussed above give motivations for reconsidering the as- 
sumption that there is a model parameter /3g so that Va;, freai{x) = fmod{x, /3q). 
In this paper, this is done by taking a model error into account. 

In Section [2] we present in detail the Gaussian process modelling of the 
model error, and show how this modelling yields a framework for calibration and 
prediction. We also give a one-dimensional illustration on an analytical function. 
In SectionjS] we present an application case, relevant to Nuclear Engineering, on 
the thermal-hydraulic code FLICA 4. The thermal-hydraulic code FLICA 4 is 
mainly dedicated to core thermal-hydraulic transient and steady state analysis 

2 The Gaussian process method for caUbration 
and prediction 

2.1 The Gaussian process modelling of the model error 

We present the Gaussian process model, that is the basis of the Gaussian process 
method for calibration and prediction. This statistical model is based on two 
main ideas: 

• The physical system x — > freai{x) docs not necessarily belong to the set of 
computer model functions {fmod{x, /3)}. We model the difference between 
the physical system and the correctly parameterized computer model by 
an error function that is called the model error. The notion of correctly 
parameterized computer model is explained below. 

• The model error function is not observable everywhere, and hence is un- 
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known for the majority of the experimental conditions. This lack of knowl- 
edge is modeled by the introduction of a stochastic framework for this 
function, that is to say it is represented by a realization of the random 
process Z{uj,x). This probabilistic modelling is a Bayesian modelling of 
the uncertainty on the deterministic model error function. The reader may 
refer to [7] p23, 24 for a discussion of Bayesian modelling of deterministic 
functions. In this context, the particular interest of Gaussian processes is 
discussed in [10] p2. Being the sum of the correctly parameterized com- 
puter model and of the model error function, the physical system itself is 
a realization of a Gaussian process. Hence, we do not use the notation 
freai{x) anymore for the physical system. Instead, we denote it by the 
random process Yreaii'-^, x). 

Motivated by these two ideas, the Gaussian process statistical model is de- 
fined by the two following equations 

Yreal{i^,x) ^ fjnod{x,/3) + Z{UJ,X) (1) 

and 

Yobs{^,x) ^Yreal{(^,x)+e{uj,x). (2) 

With: 

• a; in a probability space f2. 

• Yreaiii-^, x) thc random process of the physical system. 

• Z(uj, x) is the model error process. The random process Z is assumed to 
be Gaussian [7| [TTl [T2] and centered. The model error process Z is hence 
defined by its covariance function Cmod- In practice, this function belongs 
to a prescribed set of covariance functions (see e.g [TB] for classical exam- 
ples) and is defined up to a few hyper-parameters that can be estimated 
from data. 
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• /3 is the correct parameter of the computer model. We call it the correct 
parameter because, Z being centered, the computer model parameterized 
by /3 is the mean value of the physical system. 

• Yobs x) is the observed output of the physical system for the experimen- 
tal conditions x. This observation is the sum of the quantity of interest 
and of a measurement error e(io,x). e(uj,x) follows a Gaussian centered 
law, and is independent from one experiment to another. The variance of 
e is in general constant. 

In the sequel we do not write explicitly the uj. One could model Z as a white 
noise process (with independent components for different x), which would give 
a statistical model leading to the least square calibration of section [T] There 
are two reasons for not doing so, and use instead a covariance function with a 
dependence structure. 

• The physical system is generally continuous with respect to the experi- 
mental conditions, and so is the numerical model. Hence, as a difference, 
the model error process Z must be a process with continuous trajectories. 
This is not the case for a white noise process. 

• Similarly, it is expected that if the computer model makes a certain error 
for a given experimental point, then it will do a similar error for a nearby 
experimental point. This principle is taken into account by a covariance 
function with a dependence structure. 

The statistical modelling also allows to take into account expert judgments 
for the model parameter (3. This is done within the Bayesian framework, mod- 
elling the constant but unknown correct model parameter /3 as a random vector. 
The law of this random vector is known, and chosen according to the degree of 
knowledge one has about the model parameter (3. We use a Gaussian distribu- 
tion for the Bayesian modelling of /3. Hence, we distinguish two cases: 

No prior information case: /3 is a vector of unknown constants. 
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Prior information case: ^ is a random vector, with known mean vector 
prior ^-^d covariance matrix Qp^ior- 

In tliis paper, we work witli a linear approximation of tlie computer model 
with respect to its model parameters (within the range of values that is under 
consideration). Hence we consider computer models of the form fmod{x,f3) = 
fmod{x,(5^^^) + E™! - (^nom.i) wherc /3„„„ is the nominal vector 
around which the linear approximation is made. /3„o,„ is generally chosen by 
expert judgment or by previous calibration studies. We choose, for simplicity 
reasons, to remove the perfectly known quantities /3,jo„ and /mod(35, /3„o,„). 
Indeed, up to a shift with respect to /3 and fmod, we can consider that /3„o,„ = 
and frnod{x,f3^^^„J = 0. We then have 



The linear approximation is justified by a Taylor series expansion when the 
uncertainty concerning the correct parameter (3 is small. This linear approx- 
imation is frequently made, for example in thermal-hydraulics |14| [6], or in 
neutron transport |15j . A thorough discussion on the validity of using the linear 



The Gaussian process modelling allows to solve the two following problems: 

1. Calibration. It is the problem of estimating the correct model parameter 
/3, or equivalently to find the most accurate computer model function 

X ^ fmod{x^f3^. 

2. Prediction. For a new experimental condition Xnewj we want to predict 
the quantity of interest of the physical system, and add a measure of 
uncertainty to this prediction. The main idea is that the quantity of 
interest is not predicted by the calibrated computer model, because we 
are able to infer the value of the model error at Xnew 




(3) 



approximation in the non- linear case is given in section [2.51 
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The calibration and prediction are presented in section [2. 3| They are obtained 
using classical linear algebra tools, as long as the covariance function Cmod of 
the model error is known. In fact, the function Cmod depends on a set of hyper- 
parameters that are to be estimated from data. We present the estimation 
method in[2?2] 

Hence, in the most classical case, the Gaussian process modelling in treated 
in two steps. In a first step, the hyper-parameters of the covariance function 
are estimated so that this function is considered fixed in the second step, where 
linear algebra is used for the calibration and prediction. Note that there ex- 
ist methods where these two steps are done simultaneously, for instance if a 
Bayesian prior for the hyper-parameters of the covariance function is used. 
These methods are more costly, but can improve the quality of the Gaussian 
process modelling, as shown e.g in |16j in an optimization context. 

Note that in our case, a third step of verification is necessary. This step con- 
sists in verifying that the modelling leads to calibration and prediction that give 
satisfying results. This step can notably be carried out by Cross Validation, 
or using a new set of experimental results that was never used before. 

We now formulate the problem in vector-matrix form. Assume that n ex- 
periments are carried out at a;(^\ ...,a;^"). We denote: 

• The n X m matrix H of partial derivatives of the computer model with 
respect to f3 — /3„i). H is defined by Hij — hj{x'-'^^). 

• The random vectors y^^^ of the n observations. yobs,i is the result of the 
ith experiment. 

• The random vector e of the measurement errors for the n experiments. 
We have yobs,i = Yreaiix'-^^) + ^i- 

• The random vector z of the model error for the n experiments, z = 

• The covariance matrix of e: Times- 
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• The covariance matrix of z: '^^aod- 
For the n experiments the equations ([l]) and ([2]) become 

Vohs =B.(3 + z + €. (4) 

Hence we have a universal Kriging model (Tj. We denote by R the covariance 
matrix of the model and measurement error vector z + e. 

R. := COv{z + e) ='R„iod + 'Rrnes, (5) 

The distinction between Rmorf and Rmes is important because, usually 'R-mes 
is at least partially known from knowledge of the experimental process while 
Rjnod generally does not benefit from physical knowledge. Indeed, physical 
knowledge is used in the conception of the computer model, and hence may not 
help knowing the shape of the error of the computer model. 

We can compute the a priori law of the vector of observations, i.e. the 
statistical distribution of the observations before carrying the experiments, but 
given the hyper-parameters. In the no prior information case we have, with (3 
an unknown constant, 

~AA(H/3,R). (6) 
In the prior information case, we have, with f3 ~ N{(5j,j.^g^, Qprior) 

^ AA(H/3p„„„ HQp,„,H* + R). (7) 

Here Af{m, R) stands for the multivariate Gaussian distribution with mean 
vector m and covariance matrix R and ^ means "follows the distribution of. 

2.2 Estimation of the covariance of the model error 

For the Gaussian process modelling defined in ([ij and ^ to be tractable with 
closed form linear algebra formulas (subsection |2.3[) , it is necessary that the 
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covariance functions of the model error Z and of the measurement error e are 
known. We show here how to compute these covariance functions. 

The covariance function of the measurement error can generally be specified 
form physical expertise. This is the case here. If it is not the case, this function 
can, for example, be estimated in the same way as the model error covariance 
function. 

Generally there is no expert judgment available concerning the model error 
covariance function Cmodj as has been discussed above. A specific structure 
is chosen for Cmodt with a limited number of degrees of freedom. Hence we 
consider the family of covariance functions 



Here O is a subset of MP and Cmodfi is a stationary correlation function. We 
present classical correlation functions families, for which p = d, 6 = {lc,i, lc,d) 
and Cmod,e{x^°'\ x^''^) = Cmodfi{h) with h = a;^"^ — x^''h The component lc,i 
can be seen as a correlation length in the ith dimension. 

• exponential correlation function 



Cmod = {o'^Cmodfi, Cr > 0,9 £ &} . 




• Matern u 



= I correlation function, with l^le = 



CmodAh) = il + VQ\h\e)e^p{-V6\h\e) 



• Matern u 



= I correlation function 



CmodAh) = (1 + VlO\h\e + -j\h\l) exp {-Vw\h\e) 
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• Gaussian correlation function 

Cmo±g{h) = cxp(-|/l|g) 

The correlation functions above yield sample functions of increasing regularity 
(see for instance [H]). The importance of the regularity of the correlation 
function is presented in detail in 

Assume now that we have n experimental results y^^^ = {Uobs.ii ■■■lUobs.n) 
and recall the notations of Q and ([5]). As we have seen, Times is fixed, and 
R-mod depends on (cr^ , 6) that arc to be estimated. We use the notation Ha.e 
for the global covariance matrix R = R„iod + R-mes • 

There are several methods that can be used to estimate the hyper-parameters 
(cr^, 9) from data Vobs- The most widely used are Maximum Likelihood [17] and 
Cross Validation ^^9\. 

In this work, we use the Restricted Maximum Likelihood Estimator (RMLE) 
of {a^,6). This estimator is for instance presented in [20]. The advantage of 
this estimator is that the estimation of is independent of the estima- 

tion of /3. Furthermore, this method allows to have the same estimation of 
((T^,0) in both the prior and no prior information case. Finally, let us notice 
that n > m is required for the REML method, that is to say there are more 
experiments that model parameters. In thermal-hydraulics, the field of the ap- 
plication case, this condition generally holds in practice. Nevertheless, in other 
fields of Nuclear Engineering, typically in neutron transport ;15 , one may have 
m » n. In this case, if one want to address the present model error mod- 
elling anyway, it is recommended to work in a fully Bayesian framework, both 
for the model parameters and the covariance hyper-parameters as described in 
[7] section 4.1.4. Indeed, the very large number of model parameters makes 
the uncertainty related to the hyper-parameters of the model error covariance 
function too large to be neglected, as is done when these hyper-parameters are 
fixed to their estimated values. 
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Let W be a (n — m X n) matrix of full rank so that WH = 0. Notice that 
if H is not of full rank, then m must be replaced by the rank of H. Then 



The law of w is independent of the value of f3. Hence the RMLE (ct^, 6) is the 
Maximum Likelihood estimator on the transformed observations w: 



{a, 9) e arg min q{a, 6) (8) 



with: 

q{a, 9) = In |WR<,,eW*| + w^WR^^eW^^w. (9) 

It is shown in [21 that changing W only adds a constant (with respect to 
(cr^, 9)) term to ([o]). It is also shown in [5T] how one can avoid a matrix product 
with W. Indeed for W so that WW* = I„_™ and W*W = I„ - H(H*H)-iH* 
we have 

q{a, 0) = - In |H*H| + In |R,,e| + In |H*R;i H| + yl.^n^^eVobs, (10) 



with 



Let U, S, V be a Singular Value Decomposition of H, with U of size n x to so 
that U*U — Im.rm S a diagonal matrix of size to, with nonnegative numbers on 
the diagonal, and V an orthogonal matrix of size to, so that H — USV*. Then, 
we can show that 

qia, 9) = In |u*R-igU| + In |R,,e| + ylbsK^eVobs 

-y*,,R-iU(U*R-igU)-iU*R-iy„,,. (11) 

Hence, it does not matter if H is ill-conditioned, or even singular, since 
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its singular values are actually not used in the computation of the Restricted 
Likelihood. Using (111 allows both to avoid nx n matrix multiplications and 
to avoid numerical issues with respect to the condition number of H. 



2.3 Calibration and prediction 

Throughout this subsection, we assume that the covariance function Cmod of 
Z is estimated and fixed and we use the classical Kriging formulas to solve the 
calibration and prediction problems. The Kriging formulas, in both the Bayesian 
and frequentist cases, can be found in [7]. We will see that these formulas require 
R to be non zero, i.e there are model or measurement errors. For consistency, 
we first address the case R = 0, which is actually straightforward. If there is 
a unique (3 so that fmodix,f3) reproduces all the experiments, then (3 is the 
correct parameter with zero associated uncertainty. If there are more than one 
f3 so that fmodix, /3) reproduces all the experiments, then the computer model is 
redundantly parameterized or the number of experiments is insufficient. If there 
is no (3 so that fmod{x,f}) reproduces all the experiments, then the assumption 
of no model error and no measurement error is invalidated. In the sequel, 
we consider R non zero, and R invertible, which is the case for the classical 
covariance functions of section 12.21 

In the no prior information case, the calibration problem is the frequentist 
problem of estimating the unknown parameter /3. The maximum likelihood 
estimation of (3 is 

^ = (H*R-iH)-iH*R-iy„,,. (12) 
This estimator is unbiased and has covariance matrix 

COV0) = (H*R-iH)-\ (13) 

We see that if there is a /3 so that H/3 — Uobsj then we have = (3. This 
means that, if we are in the favorable case when the computer model can per- 
fectly reproduce the experiments, then the Gaussian process calibration of the 



15 



computer model will achieve this perfect reproduction, as should be expected. 
Finally, as the random vector (3 has Gaussian distribution, its covariance matrix 
is sufficient to yield confidence ellipsoids for /3. 

In the prior information case, the posterior distribution of /3 given the ob- 
servations j/ofcs is Gaussian with mean vector 

Impost f^prior ~^ (Qprior "I" H R, H) H R. {Vobs ^f^prior) ^ (l^) 

and covariance matrix 



Qpost = (Qp-L + H*R-iH)-i. (15) 

We can notice that, when Qp/jg^ — t- 0, then the prior information case calibration 
tends to the no prior information case calibration. This is an intuitive fact, 
because (^prior small corresponds to a small a priori knowledge of /3 and hence 
should, in the limit case, correspond to an absence of knowledge. 



Remark: The prior information case calibration of ( |14[ ) is classically used 
in neutron transport [15j . when the linear approximation ^ of the computer 
model is also made. In the reference hereabove, no model error is assumed, so 
that the physical system is predicted by the calibrated computer model only. In 
thermal-hydraulics, which is the field of the case of application, this hypothesis 
is not justified. Indeed, computer models can rely on aggregation of correlation 
models that have no physical justification. We will see in the prediction formulas 
of (16) and (18), and in the application case of section |3j that modelling the 



model error allows to significantly improve the predictions of a computer model 
that is only partially representative of the physical system. 

We now present the prediction formulas. In the same way as the computer 
model, the goal of the prediction is to give the most probable value of the phys- 
ical system, for a new experimental condition, without doing a real experiment. 
However, this most probable value is not necessarily given by the output of the 
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calibrated computer model, because the model error is inferred too. We now 
give the notations that we use for a new experimental condition Xnew 

• The random value of the physical system at Xnew- Yreaiixnew)- 

• The vector of derivatives of the computer model with respect to Pm 

at Xnew'- h(Xnew) ■ HenCC we have (h(Xnew)) ^ — hiiXnew) ■ 

• The model error at a;„e„: Znew 

• The covariance vector of the model error between (a;'^-*, a;'^"-') and Xnew- 

rmod{Xnew) givCU by {rmodiXnew))i COv(zi, Z^ew) ■ 

In the no prior information case, the Best Linear Unbiased Predictor (BLUP) 
of Yreai{xnew) with respect to the vector of observations y^i^^ is 



y(Xnew) = (h{Xnew)f^^ + (rmod (j^ne^, ))*R Hj/ofcs~H^) - 

Calibrated computer model Inferred model error 

(16) 

We refer to [221 for a detailed definition of the BLUP and the computation 



of the predictor in (16). This predictor is composed of the calibrated computer 



model and of the inferred model error. By inspection of ( 16 1, the inferred model 
error has the following properties: 

• Xnew being fixed, this term is large when the errors y^bs ~ H/3 between 
the experimental results and the calibrated computer model are large. 

• The observations being fixed, this term is a linear combination of the com- 
ponents of rmod{xnew)- Thcsc elements are usually a decreasing function 
of the distance between Xnew and the experimental conditions a;^*^ Hence, 
if Xnew is far from an experimental condition x^'^\ then the weight of this 
experimental result is small in the combination. Hence, the prediction of 
Yreaiixnew) is almost Only composed of the calibrated computer model 
when Xnew is far from any available experimental condition, while the 
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model error inference term is significant when Xnew is in the neighbor- 
hood of an available experimental condition (the neighborhood is defined 
in terms of the correlation lengths). 

The mean square error of the BLUP is 

Since only linear combinations have been used, the BLUP has Gaussian distri- 
bution and the mean square error allows to build confidence intervals. 

In the prior information case, the posterior distribution of Yreaiixnew) given 
the observations y^^^ is Gaussian with mean 



yi'^new) — (^(■^neu?)) Impost ^~ (^mod (■^neu? ) ) {Vobs -H^, 



post ) : 



calibrated computer model inferred model error 

(18) 



and variance 



^ iS^7ienj) C modi."^ newt new'} '^modi^'^new'} mod i,"^ new} 

+ iHXnew) - H*R-lrw(a;„e«,))*(H*R-lH + Q^^\^X\h{Xne^o) - H*R-V^od( 



We can make the same remarks as for (16). Similarly to calibration, the limit 



when Qp,,^jor — > of the prediction in the prior information case is the prediction 
in the no prior information case. 



2.4 Illustration on an analytical test case 

We illustrate the calibration and the prediction on an analytical test case. This 
test case is academic and allows to understand the most important features of 
the Gaussian process modelling. 

We study the case in which the physical system is the function a; — > a;^ on 
[0,1]. The computer model is frnod{x,(3) = f3o + /3ix. We assume that the 
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covariance function of the model error is known and has the Gaussian form 
Cmod{x — y) — fj^ exp (^— ^^^2^^ ^ with a = 0.3 and Ic = 0.5. There are three 
observations, noiseless, for experimental points 0.2, 0.5 and 0.8. 

The results in the no prior information case are presented in figure[T] We first 
see that there is a negative correlation in the estimation of (3. This correlation 
can be interpreted. Indeed if /3o, the value at of the line x — > /3o + is 
increased, then, for the line to remain close to the parabola x , the slope of 
the line must be decreased. Furthermore an important remark is that the 
calibrated line is above and does not go through the three observation points. 
This is surprising at first sight, all the more since the least square estimator of 
section [l] would go through the three points. This is because, as it is shown 
in ( |16[ ), the calibrated line is not intended to constitute a predictive model of 
the parabola. Indeed it is completed by the inferred model error from the three 
observation points. We see in figure [T] that the prediction curve approximates 
almost perfectly the parabola. Let us also notice that in the extrapolation region 
( < x < 0.2 and 0.8 < 2: < 1 ), the calibrated line approximates better the 
parabola than a line which would go between the three observation points. 

Hence, the inference of the model error improves the prediction capability of 
the calibrated computer model. This is all the more true as the physical system 
is predicted closer to the experiments. In extrapolation, the model error cannot 
be precisely inferred from the available observations and the inferred model error 



in ( 16 ) is hence very close to zero. Hence, in extrapolation, the prediction is 
made using the calibrated computer model only. This is as expected, because 
when one cannot statistically improve the prediction of the computer model, a 
conservative choice is to rely only to physical knowledge. Finally, we see that 
the confidence intervals (whose lengths are four times the standard deviations 



(17) which corresponds to 95% confidence) have length zero at the points where 
the noiseless observations are done, and that this length increases when one 
moves away from observation points. This shape of the confidence intervals is 
classical in Kriging with noiseless observations. 
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We also consider the prior information case with 
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The results for this case are shown in figure [2j Looking on the right plot, 
we can see that, from the prior (3 to the posterior /3, the line goes substantially 
closer to the three observation points. Nevertheless, it is not as close as in the no 
prior information case. This is a classical case in the prior information case (as 
well as in Bayesian statistics), when the observations and the prior judgment are 
in disagreement, the posterior /3 is a compromise between the observations and 
the prior judgment. Looking on the left plot, we see that a negative correlation 
between the two components of f3 appears in the posterior law of /3. 

Finally, the prediction of the physical system, and the confidence intervals 
are similar to the no prior information case. 

To conclude on the illustration on analytical functions, we see that the Gaus- 
sian process modelling has the potential to both improve the prediction capa- 
bility of the computer model and correctly assess the resulting uncertainty. In 
section |3] we confirm this on the computer model FLICA 4 9 , a thermal- 
hydraulic code relevant to core thermal-hydraulic transient and steady state 
analysis. Before this, we give general practical recommendations concerning the 
use of the Gaussian process modelling. 

2.5 General recommendations for the Gaussian process 
modelling 

The first important point is that, as stated in section [T] the method presented 
here does not address the complex field of code verification. As a consequence, 
discretization or numerical parameters, like the length or volume of a node, shall 
not be considered as model parameters or treated by the present method. 

Another important point is the linear approximation of (|3|. If the main 
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objective is to achieve a precise enough prediction of the physical system, and 
not to cahbrate the computer model, then it is not a problem if the computer 
model is not linear with respect to its model parameters. Indeed, the linear 
approximation boils down to modelling the Gaussian process Z in ([T]) as the 
model error of the linearized computer model in (|3| . In the prediction formulas 



(161 and (18 1, we see that the statistical correction can compensate for the 
linear approximation error of the code. This fact is confirmed in section [3] for 
the thermal-hydraulic code FLICA 4. The linear approximation yields a much 
cheaper method than similar non-linear methods [531 HI] > that may need to use 
Markov Chain Monte Carlo (MCMC) methods, and possibly to approximate 
the computer model by a surrogate model in both the x and (3 domain. Now, 
if calibration in itself is one of the main objectives, one should act with caution 
with respect to the linear approximation. In this case, we advise to run a 
sensitivity analysis first to check the linearity assumption (e.g the Morris method 
[25]). If the linearity assumption is infirmed, then we recommend to proceed in 
two steps. First, a non-linear calibration should be carried-out, like the least 
square calibration or a Bayesian calibration f24j. Then, the model parameters 
should be fixed to their calibrated values, or a very narrow prior, centered 
around these values, should be used, before using the present method. 

Concerning the computation of the derivatives with respect to the model 
parameters /3, two cases are possible. First the code can already provide them, 
by means of the Adjoint Sensitivity Method 2J. Similarly, automatic differ- 
entiation methods can be used on the source file of the code and yield to a 
differentiated code |26j. If these kinds of methods are not available, finite dif- 
ferences are necessary to approximate the derivatives. Our main advice here is 
not to use a too small variation step. Indeed, on the one hand, if the code is ap- 
proximatively linear with respect to the model parameters, a too large variation 
step will provide a good estimate of the derivatives anyway, whereas a too small 
variation step can yield numerical errors. On the other hand, if the code is not 
approximatively linear, the linear approximation should not be used for calibra- 
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tion. For prediction, the model error compensates for the hnear approximation 
error as weU as for the error in calculating the derivatives. 

The fourth important point is that extrapolation is not recommended. This 
is a general advice for all Kriging models. The experimental results should be 
made in the prediction domain of interest. Hence, for example, Kriging methods 
are not advisable to address scaling issues, that intrinsically ask to extrapolate 
experimental results from one scale to another. 

When dealing with more complex systems than the one of section |3j such 
as system-thermal hydraulics, one may deal with high-dimensional problems, 
either with respect to the number of experimental conditions (dimension of x) 
or to the number of model parameters (dimension of (3). The dimension of x 
is a potential problem. A common rule of thumb for Kriging models is that 
one should have n > 10dim{x). Note that screening methods exist and allow 
to select only the most impacting experimental conditions [27] . If the number 
of experiments is really too small compared to the number of experimental 
conditions, our opinion is that it is not possible to take into account the model 
error correctly, so that only the calibration should be carried-out. If (3 is high- 
dimensional, we advise, either to use a full Bayesian framework as described 
in section |2.2[ or to select only the most important model parameters (from 
physical expertise), and to fix the other model parameters at their nominal 
values. For example, in section [3j the less important parameters ai, Cf, n d are 
fixed to their nominal values. In this case the process modeling the model error 
also compensates for the error made by freezing these parameters. 
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3 Application to the thermal-hydraulic code FLICA 
4 



3.1 Presentation of the thermal- hydraulic code FLICA 4 
and of the experimental results 

The experiment consists in measuring the pressure drop in an ascending pres- 
surized flow of Uquid water through a tube that can be electricafly heated. This 
paper focuses on the frictional pressure drop (AP/ric) in a single phase flow. 

The thermal- hydraulic code FLICA 4 The mathematical model for APfric 
is given by the local equation 



^Pfr^c = T^G^hsofh- (20) 



In (20 1, each quantity is local. (20) is hence numerically integrated in space 



and time by the thermal- hydraulic code FLICA 4. In (20), is the friction 
height, p is the density, Dh is the hydraulic diameter, and G is the flowrate. 
fiso and fh are the friction coefficients respectively in the isothermal and heated 
flow regimes. The isothermal regime is defined by the temperature of the liquid 
being uniformly equal to the wall temperature. On the other hand, the heated 
flow regime is characterized by a heat flux imposed on the test section and thus 
a varying liquid temperature. In this work, we focus on the single phase case, 
and we study the isothermal and heated fiow subcases. 
The friction coefficient in the isothermal regime is 



fiso 



Re 



at 
Re''t 



if Re < Rei 

if Ret < Re (21) 



ai_ Ret-Re , at Re-Rei ■ r d Rp < Rpt 

Re Ret-Rei ^ ReH Ret-Rei " ^^1- ^ ^ 



where R^, = is the Reynolds number and /i is the viscosity. The limiting 
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values Rei and Ret for the Reynolds number are defined according to the lit- 
erature and represent the limits of the transition regime between laminar and 
turbulent flows, a; , at and bt are parts of the model parameters of the thermal- 
hydraulic code FLICA 4. They are the three components of the vector /3 of 
model parameters in the isothermal regime. 

The friction coefficient in the heated flow regime is a correction factor ex- 
pressed as 



where Ph and P^^ are the heated and wetted perimeters, T^, is the wall temper- 
ature, Tf, is the bulk temperature, and Tq = 100° C is a normalization tempera- 
ture. Cf,n and d are the three components of the vector f3 of model parameters 
in the heated flow case. Finally, note that tests with no heat flux (isothermal 
tests) result in = T^, therefore the correction factor is equal to 1, as 
expected. 

The experimental results Several experimental tests have been conducted 
in order to calibrate FLICA 4 friction model. These tests have been used in pre- 
vious calibration studies. The database is composed of rii measurements under 
isothermal conditions, and Uh measurements for heated tests. An experimental 
condition x consists in geometrical data (the channel width e, the hydraulic 
diameter D^, and the friction height Hj) and in thermal- hydraulic conditions 
(the outlet pressure Po, the flowrate G,, the wall heat flux the inlet liquid 
enthalpy hi, the thermodynamic title X^f^, and the inlet temperature Tj). For 
each test the pressure drop due to friction APfric is measured. 

3.2 Settings for the study 

Objectives We carry out the Gaussian process modelling method on the 
thermal-hydraulic code FLICA 4 in the isothermal and heated flow regimes. 
We limit the calibration part of the study to the parameters at and bt- That is 



Ph Cf{Tyj — Tf,) 



(22) 
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to say, we enforce the parameter a/ of the isothermal model, and the parameters 
Cf,n and d of the heat correction model to their nominal values, computed in 
previous calibration studies. Indeed, the parameters at and bt are the most 
influent parameters for the thermal-hydraulic code FLICA 4. 



We work in the prior information case (calibration given by ( 14 )). From pre- 
vious calibration studies, we have (ipj-ior — (0.22,0.21)*. Qp^ior corresponds to 
a 50% uncertainty and is chosen diagonal with diagonal vector (0.11^,0.105^)*. 
Hence, this prior is rather large, so that the calibration essentially depends on 
the experimental results. 

An important point is that the two categories of experimental conditions 
(control and environment variables, see section [T]) are not equally represented 
in the experimental results. The category of the control variables consists of 
the channel width e, the hydraulic diameter Dh, and the friction height Hf. 
The category of the environment variables consists of the outlet pressure Pq, 
the flowrate Gi, the wall heat flux 0^,, the liquid enthalpy h\, the thermody- 
namic title Xlf^, and the inlet temperature T^. The Ui -\- rih experiments are 
divided into eight campaigns. Within a campaign, the control variables remain 
constant, while the environment variables are varying. Hence, we only dispose 
of eight different control variables triplet. This means that, from the point of 



view of the prediction given by the Gaussian process model ( 18 1, it is a very un- 
likely that the prediction of the calibrated code is significantly improved when 
considering new control variables. We experienced that, when predicting for 
new control variables, the Gaussian process method does not damage the pre- 
dictions given by the nominal calibration of the thermal-hydraulic code FLICA 
4 but it does not significantly improve it. However, as we see next, we can 
give significantly improved predictions for observed control variables and new 
environmental variables. 

To conclude, this study follows the double objective of calibration and pre- 
diction, in the prior information case for the parameters at and bt- Concerning 
the prediction, the objective is to predict for experienced control variables and 
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new environment variables. 



On the different covariance functions The environment and control vari- 
ables listed above are not independent. Hence, it would be redundant to in- 
corporate all of them in the covariance function. One possible minimal set of 
environment and control variables is the set {Gi,Po,h\,(p^,Hj:, Dh). For this set, 
we will use the covariance function C, with C being one of the four covariance 
functions of page [12] 

To summarize, we represent the experimental conditions of an experiment 
by a; = {Gi, h{, Po, H f , Dh). The covariance function is Cmod{x^^\ x'-^'') — 
cr^C (a;*^^^ , a?^^'') , with C being either, the exponential, the Matern | , the Matern 
I or the Gaussian correlation function of section 2.2 The hyper-parameters to 
be estimated are the variance and the six correlation lengths lc,i, lc,6- 



Finally, we consider that the covariance matrix of the measurement error 
process is R,„es — <^mes^n, with ames — 150Pa provided by the experimentalists. 

Cross Validation It is well known, in the general framework of statistical 
prediction, that the quality of a predictor should not be evaluated on the data 
that helped to build it [5S] chapter 7. This is particularly true for the Gaussian 
process model, since it is based on the Kriging equations, that yields an interpo- 
lation of the observations when there is no measurement error. When a rather 
limited number of observations is available, as is the case here. Cross Valida- 
tion is a very natural method to assess the predictive capability of a prediction 
model. In our case, we are interested in the two following quality criteria for 
the Gaussian process predictor, 

RMSE' = „ E E (^c., i^) - Vobsix))^ (23) 



and 



E E {^)-yoU^)\<i.64(H^))c- ■ (24) 
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In (231 and (24 1, we use a K-io\d Cross Validation procedure, with K = 10. To 



do this, we partition the set of n experiments into ric — 10 subsets Ci, C„^, 



each subset being well distributed in the experimental domain. In ( 23 1 and 



(24), Ci^ is the set of experimental conditions and observations that is the 
union of the subsets Ci, Ci^_i, C^^+i, C„^. y^, (x) and {&{x))-^, are the 
posterior mean and standard deviation of the predicted output at x given the 
experimental data in C^^. [y^f. {x) — 1.64 ((T(a;))^, (x) + 1.64: {a (x))-^, ] 

corresponds to a 90% confidence interval. It is emphasized that at step ic of 
the Cross Validation, the Gaussian process model is built without using the 
experimental results of the class Ci^ . Hence the important point is that, in the 
computation of the posterior mean and variance of the observed value at x, this 
observed value is unused, for the estimation of the hyper-parameters as well as 
for the prediction formula. 

The Cross Validation presented here can yield a high computational cost, 
because one has to repeat the hyper-parameter estimation procedure Uc times. 
When these estimations are too costly, a simplified but approximate cross vali- 
dation procedure is possible in which the hyper-parameters are estimated only 
once for all. For this simplified version. Cross Validation is carried out only 



with respect to the prediction formulas of (18) and (19). Let us notice that, in 
this context, there exists formulas [53] that allow to calculate the result of the 
Cross Validation procedure without actually calculating K times the prediction 



formulas (16), (17), (18) and (19). These "virtual" Cross Validation formulas 
reduce even more the Cross Validation computational cost. Nevertheless, in our 
case, we are able to estimate the hyper-parameters at each step of the Cross 
Validation. Indeed, we have a rather limited number n of experimental results 
(the computation of the Restricted Likelihood is 0{n^)). 

3.3 Results 

Results in the isothermal regime In a first step, we consider the results in 
the isothermal and turbulent flow regime only. That is to say, the regime when 
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fh = 1 in (20), and when Re > Ret in (21). We have nu experimental results. 

The isothermal regime is characterized by no wall heat flux, = 0. Hence, 
it is useless to include it in the covariance function, because it is uniformly zero 
for all the experimental conditions. So, we only have five correlation lengths 
out of six to estimate, which are Ic^i, lc,3^ ^c,4j ^c,5 and Icfi corresponding to Gi, 
hi Po, Dn and Hf. 

On figure |3j we plot, for the 10-fold Cross Validation, the = 10 posterior 
mean values of at and ht for the four covariance functions of page |12[ The con- 
clusions are that the Gaussian process calibration does not change significantly 
the nominal values at — 0.22 and ht ~ 0.21. Furthermore we do not notice 
significant differences concerning the choice of the covariance function for the 
calibration. Finally, we can observe a high correlation in the posterior means 
of at and bt . This is confirmed in the posterior covariance matrix, where the 
correlation coefficient is larger than 0.95. 

Concerning the prediction, we first compute the RAISE and IC criteria 
for the four covariance functions. Results are presented in table |l] The first 



comment is that the predictive variances of ( 19 ) are reliable, because they yield 
rather precise 90% confidence intervals. This is also observed for Kriging, e.g 
in |12j . The second comment is that there is no significant difference between 
the different covariance functions. This may be due to the amplitude of the 
measurement error, which makes insignificant the problem of the regularity 
of the covariance function. It is shown in [llj Section 3.7 that, in a particular 
asymptotic context, even a small measurement error can have a significant effect 
on prediction errors. 

We now present more detailed results for the Matern | covariance function. 
We first compare the Gaussian process predictions with the predictions given 
by the calibrated code alone. With the same Cross Validation procedure, the 
RMSE criterion for the calibrated code alone is RMSE — 741Pa. This is to 
be compared with a RMSE around 300Pa for the Gaussian process method. 
Hence the inference of the model error process significantly improves the pre- 
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dictions of the code. We illustrate this in figure|4j where we plot, for each of the 
Uit observations, the predicted values and confidence intervals with the 10-fold 
Cross Validation method. The plots are done with respect to the experiment 
index. This index has physical meaning, because two experiments with suc- 
cessive indices are similar (for instance, the experiences of a given campaign 
have successive indices). We first see that the Gaussian process modelling sig- 
nificantly reduces the prediction errors, and that the confidence intervals are 
reliable. Then, we observe a regularity in the plot of the prediction error for the 
calibrated code, especially for the largest indices. This regularity is not present 
anymore in the error of the Gaussian process method. The conclusion is that 
the Gaussian process method detects a regularity in the error of the calibrated 
code, and uses it to significantly improve its predictions. 

Finally, in table|ll[ we show the ric — 10 different estimations of (cr^, /c,i, lc.3, lc,4, Ic,. 
for the different steps of the Cross Validation. The first conclusion is the singu- 
larity at steps 5 and 6 of the Cross Validation. The explanation is that, among 
the Uit experimental results, there are two singular points that have very sim- 
ilar experimental conditions but substantially different values for the quantity 
of interest. These two points are in CV classes 5 and 6. Hence the estimation 
of the hyper-parameters in the CV steps 1, 2, 3, 4, 7, 8, 9, 10, where this singu- 
larity is present in the data used for the estimation, is different from the steps 
5 and 6, where the singularity is absent. On figure [4j these two singular points 
yield the two largest prediction errors for the Gaussian process method. Indeed, 
when one of them is in the test group, the other is in the learning group. As 
the Gaussian process modelling principle is to assume a correlated model error, 
the quantity of interest of the singular point of the test group is (up to the 
measurement error) predicted by the quantity of interest of the singular point 
of the learning group. 

The correlation lengths in table [Tl] correspond to normalized experimental 
conditions varying between and 1. Hence, the second conclusion is that the 
estimated correlation lengths are rather large, corresponding to rather large 
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scales of variations of the model error, as discussed for figure |4j When an 
estimated correlation length is very large (larger than 10), it is equivalent to 
assuming than the model error is independent of the corresponding experimental 
condition. The third conclusion is that the estimations of the hyper-parameters 
can vary moderately among the Cross Validation steps. This is an argument in 
favor of reestimating the hyper-parameters at each step of the Cross Validation, 
because this takes into account these variations. Finally let us notice, that, for 
the Gaussian process model to be used for new experimental conditions, the 
hyper-parameters are to be reestimated with all the observations. 

Results in the single phase regime We now use all the experiments of 
the single phase regime (isothermal and heated flow regimes), that is to say 
n — ni + rill experiments. Hence, we estimate six correlation lengths for the six 
environment and control variables Gi, (j)^, h\, Pq, Dh and Hf. 

Concerning the prediction, we first compute the RMSE and IC criteria 
for the four covariance functions. Results are presented in table [iTll As in the 
isothermal case, we see that the predictive variances are reliable and that there 
is no significant difference between the four covariance functions. As for the 
isothermal regime, we present in more details the results for the the Matern | 
covariance function. 

With the same Cross Validation procedure, the RMSE criterion for the cal- 
ibrated code alone is RMSE — 567Pa. This is to be compared with a RMSE 
around 200Pa of the Gaussian process method. Hence the inference of the model 
error process significantly improves the predictions of the code, in the same way 
as in the isothermal regime. We illustrate this in figure [5] where we plot the 
same quantities as in figure [4j We obtain the same conclusion: the Gaussian 
process model detects a regularity in the error of the calibrated code, and uses 
it to improve its predictions. 

Influence of the linear approximation All the results above are obtained 
using the linear approximation of the thermal-hydraulic code FLICA 4 with 



30 



respect to at and bt- We have implemented the equivalent of the calibration 



and prediction formulas of (14) and (18), when the thermal-hydraulic code 
FLICA 4 is not considered linear with respect to at and bt [3]. Integrals in the 
at,bt domain were calculated on a 5 x 5 grid, which, to avoid bias, was also 
used when the linear approximation of the thermal-hydraulic code FLICA 4 
was used. Using the same 10-folds CV procedure as before, in the single phase 
regime, we obtain RMSE ~ 197.8 with the linear approximation and RAISE — 
196.9 without the linear approximation (less than 1% relative difference). The 
posterior means of at and bt, along the different CV steps, have a Root Mean 
Square Difference of 0.025 (more than 10% relative difference), between the cases 
where the linear approximation was made or not. Hence, this is an illustration 
of the general remark of section |2.5[ if the computer model is non-linear with 
respect to its calibration parameters, it is the model error with respect to the 
linearized computer model that is inferred. Thus, the predictions of the physical 
system are similar, whether or not the linear approximation is made. 



4 Conclusion 

In this work, a Gaussian process modelling method has been presented for com- 
puter model calibration and improved prediction of the underlying physical 
system. It is based on a modelling of the model error, which is the bias between 
the computer model and the physical system. A set of experimental results on 
the physical system is used, which enables to infer the model error for each new 
potential experimental point. As a result, an improved prediction for the value 
of the physical system, and an associated confidence interval, are provided. 

The Gaussian process modelling method is carried out in two steps. In a 
first step, the covariance function of the model error is estimated, based on 
the comparison between experimental results and the computer model. In this 
paper, the estimation is done with the Restricted Maximum Likelihood method, 
although the possibility of using other methods is discussed. This estimation 
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step yields the main computational cost, since one needs to minimize a function 
involving a matrix inversion. The size of this matrix is equal to the number of 
experimental results. 

Once the estimation is done, calibration and prediction can be carried out 
with closed form matrix-vector formulas. The calibration is the computation of 
the best parameters for the computer model. Physical knowledge on the calibra- 
tion parameters of the physical model can be taken into account in a Bayesian 
framework. The prediction is the computation of a predicted value and of an 
associated confidence interval for each new potential experimental point. The 
predicted value is the sum of the calibrated computer model and of a Gaussian 
inference of the model error. Hence, the calibrated computer model is com- 
pleted by a statistical term. This statistical term is based on the experimental 
results, and can significantly improve the predictions of the computer model. 
The closed form linear algebra formulas for calibration and prediction rely on 
a linearization of the computer model, with respect to the model parameters, 
around a reference parameter. These formulas can still be used when the lin- 
ear approximation does not hold, in which case, the calibration will be carried 
out on the linearized computer model, and the model error will incorporate the 
linear approximation error. It is shown that the linear approximation has no 
consequence on the prediction, but shall be treated carefully if calibration is one 
of the main objectives. The Gaussian process modelling of the model error can 
be carried out without linearization of the computer model |231 124j . but this 
yields a much more costly computation. 

The method is applied to the friction model of the thermal-hydraulic code 
FLICA 4, for which the data of several experimental campaigns are available. 
We evaluate the prediction capability of either the calibrated code alone or the 
Gaussian process modelling method. This evaluation is done rigorously using 
a ten-fold Cross Validation on the experimental results. It is shown that the 
error of the thermal-hydraulic code FLICA 4 can be divided by a factor between 
two and three. We also study different covariance functions for the model error. 
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and come to the conclusion that, due to the measurement errors, the choice 
of the covariance function does not have significant influence on the prediction 
capability in this case. 

Based on this case study, we believe the Gaussian process modelling of the 
model error to be promising in the field of computer model validation for Nuclear 
Engineering, by its ability to complete a computer model with a statistical 
inference of the model error. 

An interesting area of research is the implementation of this method to 
functional output computer models, arising for example in the case of time 
dependent problem. In the general context of Kriging with functional output, 
two kind of methods exist. The first solution is to consider a joint covariance 
structure, with respect to the inputs, and with respect to the functional output 
time or space parameter I30j . The second solution is to use a low-dimensional 
representation of functional outputs, such as PCA or wavelets, and to build a 
Kriging model for each of the coefficients of the representation. The adaptation 
of these methods to the Gaussian process modelling of the model error, in a 
Nuclear Engineering context, may motivate further research. 

5 Acknowledgments 

We thank the two anonymous reviewers for their comments and suggestions, 
which helped to improve the quality of the manuscript. 



33 



Covariance function 


RAISE {Pa) 


IC 


exponential 


289.5 


0.93 


Matern | 


296.2 


0.92 


Matern | 


302.7 


0.89 


Gaussian 


310.8 


0.88 



Table I: Prediction results in the isothermal regime. 



RMSE and IC criteria of (23) and (24) obtained with a 10-fold Cross Validation 



procedure, for the covariance functions presented in section [Z2| 
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Cross Validation step 


a 




lc,3 


IcA 


lc,5 


lc,6 


1 


2220 


2.3 


4.0 


100 


0.40 


53 


2 


2100 


2.2 


3.5 


100 


0.40 


100 


3 


2088 


2.1 


3.8 


100 


0.39 


100 


4 


2266 


2.3 


2.0 


100 


0.50 


100 


5 


4491 


3.4 


100 


24 


1.36 


100 


6 


1953 


1.6 


15 


3.4 


7.7 


0.6 


7 


2385 


2.4 


4.6 


100 


0.44 


100 


8 


2436 


2.4 


4.8 


100 


0.45 


99 


9 


2331 


2.4 


4.2 


100 


0.43 


100 


10 


2294 


2.4 


3.8 


100 


0.42 


100 



Table II: Estimated hyper-parameters in the isothermal regime. 

Estimated correlation lengths for the Matcrn | covariance function of section 

|2.2[ for the 10-fold Cross Validation procedure. 
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Covariance function 


RAISE {Pa) 


IC 


exponential 


202.2 


0.95 


Matern | 


196.2 


0.95 


Matern | 


196.9 


0.95 


Gaussian 


199.5 


0.94 



Table III: Prediction results in the single phase regime. 
Same setting as in table |Tj 
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Figure 1: Manuscript title: Calibration and improved prediction of computer 
models by universal Kriging. Authors: Frangois Bachoc, Guillaume Bois, 
Josselin Garnier and Jean-Marc Martinez. 



Calibration and prediction in the no prior information case. Left: Iso-density 



curves of the probability density function for the estimation of f3, given by ( 12 ) 



and (13 1. Right: Calibrated line (12), real parabola, prediction ( [l6| and 95% 
confidence intervals of the form [y{Xnew) — l-96(7(a;„euj)j ^(sJnetu) + l-96tT(a;„eio)]- 
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Figure 2: Manuscript title: Calibration and improved prediction of computer 
models by universal Kriging. Authors: Frangois Bachoc, Guillaume Bois, 
Josselin Garnier and Jean-Marc Martinez. 



Calibration and prediction in the prior information case. Top left: iso- 
density curves of the prior probability density function of /3. Bottom left: 
iso-density curves of the posterior probability density function of (3 given by 



(14 1 and ([T5|. Right: Calibrated line with the prior (top) and posterior (bot- 



tom, ( 14 )) mean values for /3, real parabola, prediction ( 18 1 and 95% confidence 

intervals ([l9]) of the form [y{Xnew) - '^■^Q^{Xnew),V{Xnew) + lMcr{Xnew)]- 
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X Exp Tens 

® Matern 3/2 

♦ Matern 5/2 

♦ Gaussian 



X 



.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 

at 



Figure 3: Manuscript title: Calibration and improved prediction of computer 
models by universal Kriging. Authors: Francois Bachoc, Guillaume Bois, 
Josselin Garnier and Jean-Marc Martinez. 



Calibration in the isothermal regime. 10-fold Cross Validation. Plot of 



the ric = 10 posterior means (14 1 of at and bt for the exponential, Matern 3/2, 



Matern 5/2 and Gaussian covariance functions of section 2.2 
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Figure 4: Manuscript title: Calibration and improved prediction of computer 
models by universal Kriging. Authors: Frangois Bachoc, Guillaume Bois, 
Josselin Garnier and Jean-Marc Martinez. 



Prediction errors (observed values minus predicted values (18)) and 90% 
confidence intervals for these prediction errors, derived by the calibrated 
thermal- hydraulic code FLIC A 4 (left), and the Gaussian process method 
(right). 90% confidence intervals are of the form [— 1.65(T(a;), 1.65(T(a;)] -with 



^{x) given by (19). Plot -with respect to the index of experiment. 
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Figure 5: Manuscript title: Calibration and improved prediction of computer 
models by universal Kriging. Authors: Francois Bachoc, Guillaume Bois, 
Josselin Garnier and Jean-Marc Martinez. 



Same settings as in figure [4| but in the single phase regime. 
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